active party
Proto-EVFL: Enhanced Vertical Federated Learning via Dual Prototype with Extremely Unaligned Data
Guo, Wei, Duan, Yiyang, Hu, Zhaojun, Tong, Yiqi, Zhuang, Fuzhen, Zhang, Xiao, Dong, Jin, Wu, Ruofan, Liu, Tengfei, Sun, Yifan
--In vertical federated learning (VFL), multiple enterprises address aligned sample scarcity by leveraging massive locally unaligned samples to facilitate collaborative learning. However, unaligned samples across different parties in VFL can be extremely class-imbalanced, leading to insufficient feature representation and limited model prediction space. Specifically, class-imbalanced problems consist of intra-party class imbalance and inter-party class imbalance, which can further cause local model bias and feature contribution inconsistency issues, respectively. T o address the above challenges, we propose Proto-EVFL, an enhanced VFL framework via dual prototypes. We first introduce class prototypes for each party to learn relationships between classes in the latent space, allowing the active party to predict unseen classes. We further design a probabilistic dual prototype learning scheme to dynamically select unaligned samples by conditional optimal transport cost with class prior probability. Moreover, a mixed prior guided module guides this selection process by combining local and global class prior probabilities. Finally, we adopt an adaptive gated feature aggregation strategy to mitigate feature contribution inconsistency by dynamically weighting and aggregating local features across different parties. We proved that Proto-EVFL, as the first bi-level optimization framework in VFL, has a convergence rate of 1 / T . Even in a zero-shot scenario with one unseen class, it outperforms baselines by at least 6.97%. NTRODUCTION indicates equal contribution, * represents the corresponding authors Wei Guo, Yiyang Duan and Fuzhen Zhuang are with the School of Artificial Intelligence, Beihang University, Beijing 100083, China (e-mail: { guowei, duanyiyang, zhuangfuzhen }@buaa.edu.cn). Xiao Zhang is with the School of Computer Science and Technology, Shan-dong University, Shandong 266237, China (e-mail: xiaozhang@sdu.edu.cn). Zhaojun Hu is with the Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing 100872, China (e-mail: huzhao-jun@ruc.edu.cn).
- Information Technology > Security & Privacy (1.00)
- Education (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
- (2 more...)
VFL-RPS: Relevant Participant Selection in Vertical Federated Learning
Khan, Afsana, Thij, Marijn ten, Tang, Guangzhi, Wilbik, Anna
Federated Learning (FL) allows collaboration between different parties, while ensuring that the data across these parties is not shared. However, not every collaboration is helpful in terms of the resulting model performance. Therefore, it is an important challenge to select the correct participants in a collaboration. As it currently stands, most of the efforts in participant selection in the literature have focused on Horizontal Federated Learning (HFL), which assumes that all features are the same across all participants, disregarding the possibility of different features across participants which is captured in Vertical Federated Learning (VFL). To close this gap in the literature, we propose a novel method VFL-RPS for participant selection in VFL, as a pre-training step. We have tested our method on several data sets performing both regression and classification tasks, showing that our method leads to comparable results as using all data by only selecting a few participants. In addition, we show that our method outperforms existing methods for participant selection in VFL.
- Europe > Netherlands > Limburg > Maastricht (0.05)
- North America > United States > California (0.05)
- Oceania > Australia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.68)
Towards Active Participant-Centric Vertical Federated Learning: Some Representations May Be All You Need
Irureta, Jon, Imaz, Jon, Lojo, Aizea, González, Marco, Perona, Iñigo
Vertical Federated Learning (VFL) enables collaborative model training across different participants with distinct features and common samples, while preserving data privacy. Existing VFL methodologies often struggle with realistic data partitions, typically incurring high communication costs and significant operational complexity. In this work, we introduce a novel simplified approach to VFL, Active Participant-Centric VFL (APC-VFL), that, to the best of our knowledge, is the first to require only a single communication round between participants, and allows the active participant to do inference in a non collaborative fashion. This method integrates unsupervised representation learning with knowledge distillation to achieve comparable accuracy to traditional VFL methods based on vertical split learning in classical settings, reducing required communication rounds by up to $4200\times$, while being more flexible. Our approach also shows improvements compared to non-federated local models, as well as a comparable VFL proposal, VFedTrans, offering an efficient and flexible solution for collaborative learning.
- North America > United States > Wisconsin (0.05)
- Europe > Spain > Basque Country (0.04)
- Asia > Taiwan > Taiwan Province > Taipei (0.04)
- Health & Medicine (0.95)
- Information Technology > Security & Privacy (0.68)
Backdoor Attack on Vertical Federated Graph Neural Network Learning
Yang, Jirui, Chen, Peng, Lu, Zhihui, Deng, Ruijun, Duan, Qiang, Zeng, Jianping
Federated Graph Neural Network (FedGNN) is a privacy-preserving machine learning technology that combines federated learning (FL) and graph neural networks (GNNs). It offers a privacy-preserving solution for training GNNs using isolated graph data. Vertical Federated Graph Neural Network (VFGNN) is an important branch of FedGNN, where data features and labels are distributed among participants, and each participant has the same sample space. Due to the difficulty of accessing and modifying distributed data and labels, the vulnerability of VFGNN to backdoor attacks remains largely unexplored. In this context, we propose BVG, the first method for backdoor attacks in VFGNN. Without accessing or modifying labels, BVG uses multi-hop triggers and requires only four target class nodes for an effective backdoor attack. Experiments show that BVG achieves high attack success rates (ASR) across three datasets and three different GNN models, with minimal impact on main task accuracy (MTA). We also evaluate several defense methods, further validating the robustness and effectiveness of BVG. This finding also highlights the need for advanced defense mechanisms to counter sophisticated backdoor attacks in practical VFGNN applications.
- North America > United States > Pennsylvania (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
A few-shot Label Unlearning in Vertical Federated Learning
Gu, Hanlin, Tae, Hong Xi, Chan, Chee Seng, Fan, Lixin
This paper addresses the critical challenge of unlearning in Vertical Federated Learning (VFL), an area that has received limited attention compared to horizontal federated learning. We introduce the first approach specifically designed to tackle label unlearning in VFL, focusing on scenarios where the active party aims to mitigate the risk of label leakage. Our method leverages a limited amount of labeled data, utilizing manifold mixup to augment the forward embedding of insufficient data, followed by gradient ascent on the augmented embeddings to erase label information from the models. This combination of augmentation and gradient ascent enables high unlearning effectiveness while maintaining efficiency, completing the unlearning procedure within seconds. Extensive experiments conducted on diverse datasets, including MNIST, CIFAR10, CIFAR100, and ModelNet, validate the efficacy and scalability of our approach. This work represents a significant advancement in federated learning, addressing the unique challenges of unlearning in VFL while preserving both privacy and computational efficiency.
- North America > United States > California (0.14)
- Asia > Malaysia (0.14)
- Asia > China (0.04)
- Information Technology > Security & Privacy (1.00)
- Law (0.68)
HSTFL: A Heterogeneous Federated Learning Framework for Misaligned Spatiotemporal Forecasting
Spatiotemporal forecasting has emerged as an indispensable building block of diverse smart city applications, such as intelligent transportation and smart energy management. Recent advancements have uncovered that the performance of spatiotemporal forecasting can be significantly improved by integrating knowledge in geo-distributed time series data from different domains, \eg enhancing real-estate appraisal with human mobility data; joint taxi and bike demand predictions. While effective, existing approaches assume a centralized data collection and exploitation environment, overlooking the privacy and commercial interest concerns associated with data owned by different parties. In this paper, we investigate multi-party collaborative spatiotemporal forecasting without direct access to multi-source private data. However, this task is challenging due to 1) cross-domain feature heterogeneity and 2) cross-client geographical heterogeneity, where standard horizontal or vertical federated learning is inapplicable. To this end, we propose a Heterogeneous SpatioTemporal Federated Learning (HSTFL) framework to enable multiple clients to collaboratively harness geo-distributed time series data from different domains while preserving privacy. Specifically, we first devise vertical federated spatiotemporal representation learning to locally preserve spatiotemporal dependencies among individual participants and generate effective representations for heterogeneous data. Then we propose a cross-client virtual node alignment block to incorporate cross-client spatiotemporal dependencies via a multi-level knowledge fusion scheme. Extensive privacy analysis and experimental evaluations demonstrate that HSTFL not only effectively resists inference attacks but also provides a significant improvement against various baselines.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Austria > Vienna (0.14)
- Asia > China > Beijing > Beijing (0.05)
- (10 more...)
- Information Technology > Security & Privacy (1.00)
- Transportation > Ground > Road (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.68)
FedAdOb: Privacy-Preserving Federated Deep Learning with Adaptive Obfuscation
Gu, Hanlin, Luo, Jiahuan, Kang, Yan, Yao, Yuan, Zhu, Gongxi, Li, Bowen, Fan, Lixin, Yang, Qiang
Abstract--Federated learning (FL) has emerged as a collaborative approach that allows multiple clients to jointly learn a machine learning model without sharing their private data. The concern about privacy leakage, albeit demonstrated under specific conditions [1], has triggered numerous follow-up research in designing powerful attacking methods and effective defending mechanisms aiming to thwart these attacking methods. Nevertheless, privacy-preserving mechanisms employed in these defending methods invariably lead to compromised model performances due to a fixed obfuscation applied to private data or gradients. In this article, we, therefore, propose a novel adaptive obfuscation mechanism, coined FedAdOb, to protect private data without yielding original model performances. T echnically, FedAdOb utilizes passport-based adaptive obfuscation to ensure data privacy in both horizontal and vertical federated learning settings. The privacy-preserving capabilities of FedAdOb, specifically with regard to private features and labels, are theoretically proven through Theorems 1 and 2. Furthermore, extensive experimental evaluations conducted on various datasets and network architectures demonstrate the effectiveness of FedAdOb by manifesting its superior trade-off between privacy preservation and model performance, surpassing existing methods. Federated Learning (FL) offers a privacy-preserving framework that allows multiple organizations to jointly build global models without disclosing private datasets [2], [3], [4], [5]. Two distinct paradigms have been proposed in the context of FL [5]: Horizontal Federated Learning (HFL) and V ertical Federated Learning (VFL). HFL focuses on scenarios where multiple entities have similar features but different samples. It is suitable for cases where data sources are distributed, such as healthcare institutions contributing patient data for disease prediction. On the other hand, VFL addresses situations where entities hold different attributes or features of the same samples. This approach is useful in scenarios like combining demographic information from banks with call records from telecom companies to predict customer behavior. Since the introduction of HFL and VFL, studies have highlighted the existence of privacy risks in specific scenarios.
- Asia > China > Hong Kong (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (2 more...)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Data Science > Data Mining > Big Data (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Vertical Federated Learning Hybrid Local Pre-training
Li, Wenguo, Guo, Xinling, Jiao, Xu, Huang, Tiancheng, Yan, Xiaoran, Yang, Yao
Vertical Federated Learning (VFL), which has a broad range of real-world applications, has received much attention in both academia and industry. Enterprises aspire to exploit more valuable features of the same users from diverse departments to boost their model prediction skills. VFL addresses this demand and concurrently secures individual parties from exposing their raw data. However, conventional VFL encounters a bottleneck as it only leverages aligned samples, whose size shrinks with more parties involved, resulting in data scarcity and the waste of unaligned data. To address this problem, we propose a novel VFL Hybrid Local Pre-training (VFLHLP) approach. VFLHLP first pre-trains local networks on the local data of participating parties. Then it utilizes these pre-trained networks to adjust the sub-model for the labeled party or enhance representation learning for other parties during downstream federated learning on aligned data, boosting the performance of federated models. The experimental results on real-world advertising datasets, demonstrate that our approach achieves the best performance over baseline methods by large margins. The ablation study further illustrates the contribution of each technique in VFLHLP to its overall performance.
- Asia > China > Zhejiang Province > Hangzhou (0.05)
- North America > United States > Virginia (0.04)
Hyperparameter Optimization for SecureBoost via Constrained Multi-Objective Federated Learning
Kang, Yan, Ren, Ziyao, Fan, Lixin, Yang, Linghua, Tong, Yongxin, Yang, Qiang
SecureBoost is a tree-boosting algorithm that leverages homomorphic encryption (HE) to protect data privacy in vertical federated learning. SecureBoost and its variants have been widely adopted in fields such as finance and healthcare. However, the hyperparameters of SecureBoost are typically configured heuristically for optimizing model performance (i.e., utility) solely, assuming that privacy is secured. Our study found that SecureBoost and some of its variants are still vulnerable to label leakage. This vulnerability may lead the current heuristic hyperparameter configuration of SecureBoost to a suboptimal trade-off between utility, privacy, and efficiency, which are pivotal elements toward a trustworthy federated learning system. To address this issue, we propose the Constrained Multi-Objective SecureBoost (CMOSB) algorithm, which aims to approximate Pareto optimal solutions that each solution is a set of hyperparameters achieving an optimal trade-off between utility loss, training cost, and privacy leakage. We design measurements of the three objectives, including a novel label inference attack named instance clustering attack (ICA) to measure the privacy leakage of SecureBoost. Additionally, we provide two countermeasures against ICA. The experimental results demonstrate that the CMOSB yields superior hyperparameters over those optimized by grid search and Bayesian optimization regarding the trade-off between utility loss, training cost, and privacy leakage.
VFedMH: Vertical Federated Learning for Training Multiple Heterogeneous Models
Wang, Shuo, Gai, Keke, Yu, Jing, Zhu, Liehuang, Choo, Kim-Kwang Raymond, Xiao, Bin
Vertical federated learning has garnered significant attention as it allows clients to train machine learning models collaboratively without sharing local data, which protects the client's local private data. However, existing VFL methods face challenges when dealing with heterogeneous local models among participants, which affects optimization convergence and generalization. To address this challenge, this paper proposes a novel approach called Vertical federated learning for training multiple Heterogeneous models (VFedMH). VFedMH focuses on aggregating the local embeddings of each participant's knowledge during forward propagation. To protect the participants' local embedding values, we propose an embedding protection method based on lightweight blinding factors. In particular, participants obtain local embedding using local heterogeneous models. Then the passive party, who owns only features of the sample, injects the blinding factor into the local embedding and sends it to the active party. The active party aggregates local embeddings to obtain global knowledge embeddings and sends them to passive parties. The passive parties then utilize the global embeddings to propagate forward on their local heterogeneous networks. However, the passive party does not own the sample labels, so the local model gradient cannot be calculated locally. To overcome this limitation, the active party assists the passive party in computing its local heterogeneous model gradients. Then, each participant trains their local model using the heterogeneous model gradients. The objective is to minimize the loss value of their respective local heterogeneous models. Extensive experiments are conducted to demonstrate that VFedMH can simultaneously train multiple heterogeneous models with heterogeneous optimization and outperform some recent methods in model performance.